Scale and Concurrency of Massive File System Directories

نویسندگان

Swapnil Patil

Christos Faloutsos

Gregory R. Ganger

Mahadev Satyanarayanan

چکیده

File systems store data in files and organize these files in directories. Over decades, file systems have evolved to handle increasingly large files: they distribute files across a cluster of machines, they parallelize access to these files, they decouple data access from metadata access, and hence they provide scalable file access for high-performance applications. Sadly, most cluster-wide file systems lack any sophisticated support for large directories. In fact, most cluster file systems continue to use directories that were designed for humans, not for large-scale applications. The former use-case typically involves hundreds of files and infrequent concurrent mutations in each directory, while the latter use-case consists of tens of thousands of concurrent threads that simultaneously create large numbers of small files in a single directory at very high speeds. As a result, most cluster file systems exhibit very poor file create rate in a directory either due to limited scalability from using a single centralized directory server or due to reduced concurrency from using a system-wide synchronization mechanism. This dissertation proposes a directory architecture called GIGA+ that enables a directory in a cluster file system to store millions of files and sustain hundreds of thousands of concurrent file creations every second. GIGA+ makes two contributions: a concurrent

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Swapnil Patil - Ph.D. Dissertation

متن کامل

Towards a Grid File System Based on a Large-Scale BLOB Management Service

This paper addresses the problem of building a grid file system for applications that need to manipulate huge data, distributed and concurrently accessed at a very large scale. In this paper we explore how this goal could be reached through a cooperation between the Gfarm grid file system and BlobSeer, a distributed object management system specifically designed for huge data management under h...

متن کامل

Using File-Grain Connectivity to Implement a Peer-to-Peer File System

Recent work has demonstrated a peer-to-peer storage system that locates data objects using O logN messages by placing objects on nodes according to pseudo-randomly chosen IDs. While elegant, this approach constrains system functionality and flexibility: files are immutable, directories and symbolic names are not supported, data location is fixed, and access locality is not exploited. This paper...

متن کامل

Scale and Concurrency of GIGA+: File System Directories with Millions of Files

We examine the problem of scalable file system directories, motivated by data-intensive applications requiring millions to billions of small files to be ingested in a single directory at rates of hundreds of thousands of file creates every second. We introduce a POSIX-compliant scalable directory design, GIGA+, that distributes directory entries over a cluster of server nodes. For scalability, ...

متن کامل

Multi-Directory Hashing

We present a new dynamic hashing scheme for disk-based databases, called Multi-Directory Hashing (MDH). MDH uses multiple hash directories to access a file. The size of each hash directory grows dynamically with the file size. The advantages of MDH are enhanced concurrency, improved bucket utilization and smaller total directory size than single-directory hashing. The expected utilization of MD...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Scale and Concurrency of Massive File System Directories

نویسندگان

چکیده

منابع مشابه

Swapnil Patil - Ph.D. Dissertation

Towards a Grid File System Based on a Large-Scale BLOB Management Service

Using File-Grain Connectivity to Implement a Peer-to-Peer File System

Scale and Concurrency of GIGA+: File System Directories with Millions of Files

Multi-Directory Hashing

عنوان ژورنال:

اشتراک گذاری